Processor and methods to reduce power consumption of processor components

ABSTRACT

Periods of futile activity by one or more logic circuits of a component of a processor may be predicted, and then during each such period, one or more of the logic circuits may operate in a power-save state with reduced power consumption, with the latter part of the period being used to bring the logic circuits back into performance state, so that performance is not diminished beyond an acceptable level due to the power-save state. The decision of whether to reduce the power consumption of a particular logic circuit of a particular processor component is to have at a particular future time is made internally in the particular processor component based on one or more signals received by the particular processor component.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.10/682,892, filed on Oct. 14, 2003, which is hereby incorporated byreference.

BACKGROUND OF THE INVENTION

Computer designs may be directed to save energy. In particular, inportable devices the length of the battery life is of importance. Onearea in which a designer may deal with power saving is the operation ofthe processor core.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention are illustrated by way of example and notlimitation in the figures of the accompanying drawings, in which likereference numerals indicate corresponding, analogous or similarelements, and in which:

FIG. 1 is a block diagram of an apparatus having a processor accordingto an embodiment of the invention, the processor having an out-of-ordersubsystem that has a reservation station and a reorder buffer;

FIG. 2 is a flowchart illustration of an exemplary method for settingthe state of operation of the dispatch logic circuitry of thereservation station, according to an embodiment of the invention;

FIG. 3 is a flowchart illustration of an exemplary method for settingthe state of operation of the retire logic circuitry of the reorderbuffer, according to an embodiment of the invention; and

FIG. 4 is an illustration of an exemplary finite state machine,according to an embodiment of the invention.

It will be appreciated that for simplicity and clarity of illustration,elements shown in the figures have not necessarily been drawn to scale.For example, the dimensions of some of the elements may be exaggeratedrelative to other elements for clarity.

DETAILED DESCRIPTION OF THE INVENTION

In the following detailed description, numerous specific details are setforth in order to provide a thorough understanding of embodiments of theinvention. However it will be understood by those of ordinary skill inthe art that the embodiments of the invention may be practiced withoutthese specific details. In other instances, well-known methods,procedures, components and circuits have not been described in detailfor clarity.

According to some embodiments of the invention, periods of futileactivity by one or more components of a processor may be predicted, andthen during at least a portion of each such period, the component mayoperate in a power-save state, with the latter part of the period beingused to bring the component back into performance state. In thepower-save state, the power consumption of the component is reducedrelative to the power consumption of the component in performance state.

In general, several conditions may need to be satisfied before thecomponent is taken into power-save state. Similarly, recovery frompower-save state to performance state may involve several stages.

In some embodiments of the invention, the conditions and stages may bedesigned so that the transition into and recovery from power-save stateare substantially transparent and do not adversely affect theperformance of the processor or of the apparatus in which the processoris installed beyond an acceptable level.

Embodiments of the invention will be described for particular examplesof processor components. Then embodiments of the invention will bedescribed for a general processor component.

Embodiments of the invention may be used in any apparatus having aprocessor. For example, the apparatus may be a portable device that maybe powered by a battery. A non-exhaustive list of examples of suchportable devices includes laptop and notebook computers, mobiletelephones, personal digital assistants (PDA), and the like.Alternatively, the apparatus may be a non-portable device, such as, forexample, a desktop computer or a server computer.

As shown in FIG. 1, an apparatus 2 may include a processor 4 and asystem memory 6, and may optionally include a voltage monitor 8.Well-known components and circuits of apparatus 2 and of processor 4 arenot shown in FIG. 1 for clarity.

Design considerations, such as, but not limited to, processorperformance, cost and power consumption, may result in a particularprocessor design, and it should be understood that the design ofprocessor 4 shown in FIG. 1 is merely an example and that embodiments ofthe invention are applicable to other processor designs as well. Anon-exhaustive list of examples for processor 4 includes a centralprocessing unit (CPU), a digital signal processor (DSP), a reducedinstruction set computer (RISC), a complex instruction set computer(CISC) and the like. Moreover, processor 4 may be part of an applicationspecific integrated circuit (ASIC) or may be part of an applicationspecific standard product (ASSP).

A non-exhaustive list of examples for system memory 6 includes a dynamicrandom access memory (DRAM), a synchronous dynamic random access memory(SDRAM), a flash memory, a double data rate (DDR) memory, RAMBUS dynamicrandom access memory (RDRAM) and the like. Moreover, system memory 6 maybe part of an application specific integrated circuit (ASIC) or may bepart of an application specific standard product (ASSP).

System memory 6 may store macroinstructions to be executed by processor4. Macroinstructions retrieved from system memory 6 may be storedtemporarily in an instruction cache memory 10 of processor 4. Systemmemory 6 may also store data for the macroinstructions, or the data maybe stored elsewhere. Data for the macroinstructions retrieved fromsystem memory 6 or elsewhere may be stored temporarily in a data cachememory 12 of processor 4.

A processor having more than one execution unit (EU) 14 may employout-of-order techniques in order to use the execution units in anefficient manner. An instruction decoder 16 may decode amacroinstruction into one or more micro-operations (“u-ops”) dependingon the type of macroinstruction or according to some other criterion.Instruction decoder 16 may assign a unique identification number to eachu-op. Each u-op may be executed by an out-of-order (OOO) subsystem 18 ofthe processor. OOO subsystem 18 enables more than one u-op to beexecuted at the same time, although the u-ops may be executed in adifferent order than the order in which they were received by OOOsubsystem 18.

Processor 4 may include a real register file (RRF) 20 for storingexecution results of u-ops in the order in which the u-ops were receivedby OOO subsystem 18 (storing the execution result of a u-op in RRF 20 iscalled “retiring” the u-op). Execution results of u-ops may be storedtemporarily in OOO subsystem 18 until such time as those results may bestored in RRF 20.

Processor 4 may include a register alias table and allocation unit(RAT/ALLOC) 22. RAT/ALLOC 22 may allocate temporary registers (notshown) of OOO subsystem 18 as the destinations of u-ops received frominstruction decoder 16, to store the results of the u-ops until theresults are retired. RAT/ALLOC 22 may also identify where the sources ofu-ops received from instruction decoder 16 are, and may rename thesources as necessary. A u-op may include one or more operands and one ormore “op-codes”, where an op-code is a field of the u-op that definesthe type of operation to be performed on some or all of the operands.RAT/ALLOC 22 may also assign for each op-code which of EU(s) 14 is toexecute the op-code.

At each cycle of a clock 24, instruction decoder 16 may receive up tothree macroinstructions from instruction cache memory 10 and may outputone, two or three u-ops from previously received macroinstructions. Ateach cycle of clock 24, RAT/ALLOC 22 may receive at most three u-opsfrom instruction decoder 16 and may output to OOO subsystem 18 at mostthree allocated/renamed u-ops and their corresponding EU assignments.(In other embodiments, the limit of macroinstructions received by theinstruction decoder per clock cycle from the instruction cache memorymay be other than three. Similarly, in other embodiments, the limit ofu-ops output by the instruction decoder per clock cycle may be otherthan three. In other embodiments, the limit of u-ops received by theRAT/ALLOC per clock cycle from the instruction decoder may be other thanthree. Similarly, in other embodiments, the limit of allocated/renamedu-ops and corresponding EU assignments output by the RAT/ALLOC per clockcycle to OOO subsystem 18 may be other than three.)

The Reservation Station

OOO subsystem 18 may include a reservation station 26 that, at eachcycle of clock 24, may receive from RAT/ALLOC 22 and store internallythe op-codes, the identification numbers and the EU assignments of atmost three allocated/renamed u-ops. (In other embodiments, the limit ofu-ops received by reservation station 26 per clock cycle may be otherthan three.) The operands for a u-op may be received by reservationstation 26 at a different cycle of clock 24 than the cycle at whichop-code(s), identification number and EU assignment(s) of that u-op arereceived. Reservation station 26 may receive an operand from instructiondecoder 16 or, in the case of an operand that is an execution result ofanother u-op, from one of execution units 14 via a write-back (WB) bus30.

Once all of the operands for a particular u-op have been received, theu-op is “valid for dispatching”. Dispatch logic circuitry 262 ofreservation station 26 may dispatch the particular u-op to the assignedone or more EUs via signals 32 only if certain resources are available.A non-exhaustive list of the resources reservation station 26 may checkfor availability includes the assigned EU(s), signals 32, and write-backbus 30. Reservation station 26 may check that the assigned one or moreEUs are available to execute the one or more op-codes of the particularu-op, that signals 32 have the capacity to carry the op-codes, operandsand identification numbers of the particular u-op, and that write-backbus 30 will be available to carry the execution results of theparticular u-op once the results are calculated.

Reservation station 26 may store and handle more than one u-op at atime. The conditions for execution of one u-op may be fulfilled beforethe conditions for execution of a u-op that was received earlier.Consequently, u-ops may be dispatched and executed in an order that maybe different from the order in which they were received by OOO subsystem18.

“Fast” and “Slow” U-Ops

A u-op may be categorized as a “fast” or “slow” u-op, referring to thenumber of cycles of clock 24 that pass between the time an executionunit 14 receives a u-op from signals 32 and the time the same executionunit 14 outputs the execution result on write-back bus 30. Adding twointegers is an example of a fast u-op, and an integer execution unit mayrequire, for example, one, two, three, four or five cycles of clock 24to execute the fast integer u-op. The number of cycles required toexecute a fast u-op may depend upon the type of u-op.

Dividing a floating point number by another floating point number is anexample of a slow u-op, and a floating point execution unit may requirea constant but large number of cycles of clock 24 (for example,forty-three, forty-four or forty-five cycles) to execute the u-op.Fetching an operand from data cache memory 12 is another example of aslow u-op, and a load execution unit may require an unpredictable numberof cycles of clock 24 to execute the u-op, since if there is a cachemiss, the operand will need to be fetched from system memory 6 to datacache memory 12 before it is fetched to the load execution unit. Anexecution unit executing a slow u-op may send the identification numberof the u-op on signals 34 several cycles of clock 24 before theexecution unit sends the execution results of the u-op on write-back bus30.

Setting the State of Operation of the Dispatch Logic Circuitry of theReservation Station

Dispatch logic circuitry 262 may have a performance state of operation,and a power-save state of operation that may consume less power than theperformance state. Reservation station 26 may include control circuitry264 to set the state of dispatch logic circuitry 262 via a signal 266based upon the internal state of reservation station 26 and informationreceived on signals 34 from EUs 14.

FIG. 2 is a flowchart illustration of an exemplary method of setting thestate of dispatch logic circuitry 262. When OOO subsystem 18 is poweredup, control circuitry 264 may set dispatch logic circuitry 262 toperformance state (-210-). When dispatch logic circuitry 262 is inperformance state (-210-), it may dispatch up to five u-ops in eachcycle of clock 24 to execution units 14, according to availability ofop-codes, operands and resources (-212-). (In other embodiments, thelimit of u-ops dispatched by dispatch logic circuitry 262 per clockcycle may be other than five.) If at least one of the u-ops stored byreservation station 26 is “valid for dispatching” (-214-), controlcircuitry 264 may keep dispatch logic circuitry 262 in performancestate, and dispatch logic circuitry 262 may continue to dispatch u-opsto execution units 14.

If none of the u-ops stored by reservation station 26 is “valid fordispatching” (-214-), control circuitry 264 may check the status of fastu-ops that are currently being executed by execution units 14 (-218-).

If there are fast u-ops executed by execution units 14, controlcircuitry 264 may return to -214- to check whether there are any “validfor dispatching” u-ops. If there are no fast u-ops executed (-218-),control circuitry 264 may set dispatch logic circuitry 262 to power-savestate (-200-).

Dispatch logic circuitry 262 remains in power-save state during theexecution of slow u-ops, and remains in power-save state until acompletion indication is received from a slow u-op by reservationstation 26 (-202-), or until a “valid for dispatching” u-op isallocated. While dispatch logic circuitry 262 is in power-save state,reservation station 26 may store a u-op that is “valid for dispatching”on allocation, or a u-op that will become “valid for dispatching” onceits operand, which is the execution result of a slow u-op, is receivedby reservation station 26. In the first case (-202-), dispatch logiccircuitry 262 will exit power-save state after an appropriate number ofclock cycles (-208-). In the latter case, dispatch logic circuitry 262will remain in power-save state until the slow u-op has completed.Control circuitry 264 may receive the identification number of the slowu-op on signals 34 several cycles of clock 24 before the executionresult of the slow u-op is sent on signals 30 (-203-). Therefore, insuch a situation, control circuitry 264 may wait an appropriate numberof cycles of clock 24 (-208-) and then may set dispatch logic circuitry262 to performance state (-210-). The appropriate number of cycles towait is such that no performance of dispatch logic circuitry 262 orreservation station 26 is lost compared to a situation in which there isno power-save state for dispatch logic circuitry 262.

Dispatch logic circuitry 262 may comprise sub-blocks (not shown) thatmay be powered separately. For example, once control circuitry 264receives the identification number of a slow u-op on signals 34, controlcircuitry 264 may set dispatch logic circuitry 262 into a partialpower-save state (-204-), in which one or more sub-blocks that were notpowered in the power-save state are now powered. The sub-blocksreceiving power in the partial power-save state of dispatch logiccircuitry 262 may include, for example, a counter to enable waiting theappropriate number of cycles as above. FIG. 2 shows an ellipsis 206 toshow that there may be other indications that control circuitry 264 ismonitoring, and that when such other indications are identified bycontrol circuitry 264, additional sub-blocks of dispatch logic circuitry262 may be powered.

The exemplary method of FIG. 2 demonstrates that control circuitry 264may set dispatch logic circuitry 262 to a power-save state if certainconditions are satisfied, the conditions ensuring that the performanceof processor 4 will not be adversely affected during the transition intoand recovery from power-save state. By reading the internal state ofreservation station 26 (e.g. none of the u-ops stored by reservationstation 26 are “valid for dispatching”) and by monitoring incomingsignals, control circuitry 264 may predict periods during which theactivities of dispatch logic circuitry 262 (namely, dispatching u-ops)are futile and may set dispatch logic circuitry 262 to power-save state.Similarly, by monitoring incoming signals (e.g. the identificationnumber of a slow u-op several cycles before the execution results aregoing to be sent, or the allocation of a u-op that is “valid fordispatching” upon allocation), control circuitry 264 may predict periodsduring which the activities of dispatch logic circuitry 262 are notfutile, and may set dispatch logic circuitry 262 to performance state.The increase in power consumption of dispatch logic circuitry 262 frompower-save state may be incremental.

The Reorder Buffer

Referring back to FIG. 1, OOO subsystem 18 may include a reorder buffer(ROB) 28 to temporarily store execution results until they are stored inreal register file 20 in the order in which the u-ops were received byOOO subsystem 18. Reorder buffer 28 may receive execution results fromexecution units 14 on write-back bus 30, and the identification numbersof the corresponding u-ops on signals 34. Reorder buffer 28 may storeinternally the identification numbers and execution results until theu-ops are retired to real register file 20. For each u-op, reorderbuffer 28 may receive the identification number on signals 34 severalcycles of clock 24 before the execution results are received onwrite-back bus 30.

A particular u-op is “valid for retiring” if its execution results havebeen received by reorder buffer 28 and other conditions, if any, havebeen satisfied. A retire logic circuitry 282 of reorder buffer 28 maythen retire the “valid for retiring” u-ops according to the originalorder of u-ops and store their execution results in real register file20.

At each cycle of clock 24, retire logic circuitry 282 may retire at mostthree “valid for retiring” u-ops. (In other embodiments, the limit ofu-ops retired by retire logic circuitry 282 per clock cycle may be otherthan three.) No u-ops will be retired until the u-op that is next to beretired according to the original order of u-ops is “valid forretiring”. In such a situation, reorder buffer 28 may be able to savepower without sacrificing performance.

Setting the State of Operation of the Retire Logic Circuitry of theReorder Buffer

Retire logic circuitry 282 may have a performance state of operation,and a power-save state of operation that may have a lower performanceand consume less power than the performance state. Reorder buffer 28 mayinclude control circuitry 284 to set the state of retire logic circuitry282 via a signal 286 based upon the internal state of reorder buffer 28and information received on signals 34 from EUs 14.

FIG. 3 is a flowchart illustration of an exemplary method of setting thestate of retire logic circuitry 282. When OOO subsystem 18 is poweredup, control circuitry 284 may set retire logic circuitry 282 toperformance state (-308-).

In performance state, retire logic circuitry 282 may retire up to threeu-ops in each cycle of clock 24 to real register file 20, according toavailability of execution results (-312-). (In other embodiments, thelimit of u-ops retired by retire logic circuitry 282 per clock cycle toreal register file 20 may be other than three.) If reorder buffer 28stores at least one u-op, and if the next u-op to be retired accordingto the original order of u-ops is “valid for retiring” (-316-), controlcircuitry 284 may keep retire logic circuitry 282 in performance state,and retire logic circuitry 282 may continue to retire u-ops to realregister file 20 (-312-).

If reorder buffer 28 stores at least one u-op, but the next u-op to beretired is not “valid for retiring” (-316-), control circuitry 284 maythen check whether write-back (WB) bus 30 is carrying data intended forthe next u-op to be retired (-317-). If so, then after waiting anappropriate number of cycles (-318-), the next u-op to be retired willbecome “valid for retiring” and retire logic circuitry 282 will retireone or more u-ops to real register file 20 (-312-). If not, then controlcircuitry 284 may set retire logic circuitry 282 to power-save state(-300-).

When retire logic circuitry 282 is in power-save state (-300-), at leastone u-op is waiting to be retired. Therefore, control circuitry 284 maymonitor signals 34 and may wait to receive an identification number thatmatches the identification number of the u-op that is next to be retiredaccording to the original order of u-ops (-304-). When thisidentification number is received, reorder buffer 28 may wait theappropriate number of cycles of clock 24 (-309-), and may then setretire logic circuitry 282 to performance state (-308-).

The exemplary method of FIG. 3 demonstrates that control circuitry 284may set retire logic circuitry 282 to a power-save state if certainconditions are satisfied, the conditions ensuring that the performanceof processor 4 will not be adversely affected during the transition intoand recovery from power-save state. By reading the internal state ofreorder buffer 28 (e.g. that the next u-op to be retired is not “validfor retiring”) and by monitoring incoming signals, control circuitry 284may predict periods during which the activities of retire logiccircuitry 282 (namely, retiring u-ops) are futile and may set retirelogic circuitry 282 to power-save state. Similarly, by monitoringincoming signals (e.g. receiving the identification number of the nextu-op to be retired before the execution results of that u-op are goingto be sent), control circuitry 284 may predict periods during which theactivities of retire logic circuitry 282 are not futile, and may setretire logic circuitry 282 to performance state. The increase in powerconsumption of retire logic circuitry 282 from power-save state may beincremental.

Setting the State of Operation of One or More Logic Circuits of aComponent of the Processor for which Periods of Futile Activity may bePredicted

In more general terms, any component of processor 4 for which periods offutile activity may be predicted may be designed to include controlcircuitry to predict such periods and to put one or more logic circuitsof the component into a power-save state of operation. The controlcircuitry may read the internal status of the processor component andmonitor input signals to the processor component in order to determinewhen to put the one or more logic circuits of the component intopower-save state. Once in power-save state, the control circuitry maymonitor input signals to the processor component in order to determinewhen to recover to a performance state. A finite state machine such asthe exemplary finite state machine shown in FIG. 4 may be implemented toensure that the performance of the processor and the processor componentare not adversely affected by putting one or more logic circuits of theprocessor component into power-save state.

It should be noted that the performance of a processor such as exemplaryprocessor 4 is dependent upon a random sequence of macroinstructionsreceived by the processor. Therefore the times at which a logic circuitof a processor component will have futile activity are also random.

Although the foregoing description uses the example of a processor,embodiments of the present invention are equally applicable tointegrated circuits having other logic circuits, for example, buscontrollers, timers, and other such peripherals. Embodiments of thepresent invention may be applied to any integrated circuit having alogic circuit that has idle states that may be predicted.

While certain features of the invention have been illustrated anddescribed herein, many modifications, substitutions, changes, andequivalents will now occur to those of ordinary skill in the art. It is,therefore, to be understood that the appended claims are intended tocover all such modifications and changes as fall within the true spiritof the invention.

1. A method comprising: determining that a logic block within amicroprocessor is not presently able to perform an operation; andplacing the logic block in a low power state.
 2. The method of claim 1,further comprising monitoring at least one signal to determine a futuretime when the logic block will be able to perform the operation, and atthe future time, placing the logic block in a performance state.
 3. Themethod of claim 2, further comprising placing at least one sub-blockwithin the logic block in a powered state before placing the logic blockin the performance state.
 4. The method of claim 3, wherein the at leastone sub-block is a counter.
 5. The method of claim 2, wherein the logicblock is a dispatch logic block.
 6. The method of claim 5, wherein theoperation is dispatching of at least one micro-operation.
 7. The methodof claim 5, wherein determining that the dispatch logic block is notpresently able to perform an operation comprises determining that thereare no micro-operations valid to dispatch.
 8. The method of claim 7,wherein determining that the dispatch logic block is not presently ableto perform the operation further comprises determining that there are nofast micro-operations in progress.
 9. The method of claim 5, whereinmonitoring at least one signal to determine a future time when thedispatch logic block will be able to perform the operation comprisesreceiving a completion indication for a slow micro-operation.
 10. Themethod of claim 9, wherein monitoring at least one signal to determine afuture time when the dispatch logic block will be able to perform theoperation further comprises receiving a valid for dispatchingmicro-operation.
 11. The method of claim 2, wherein the logic block is aretire logic block.
 12. The method of claim 11, wherein the operation isretiring at least one micro-operation.
 13. The method of claim 11,wherein determining that the retire logic block is not presently able toperform the operation comprises determining that there are no valid forretiring micro-operations.
 14. The method of claim 13, whereindetermining that the retire logic block is not presently able to performan operation further comprises determining that a write back bus doesnot have data intended for a next micro-operation.
 15. The method ofclaim 11, wherein monitoring at least one signal to determine a futuretime when the retire logic block will be able to perform the operationcomprises receiving an identification number associated with a nextmicro-operation to be retired according to an original order ofmicro-operations.
 16. An apparatus, comprising: a logic block; andcontrol circuitry coupled to the logic block, the control circuitry todetermine when the logic block is unable to perform an operation and tosubsequently place the logic block in a low power state.
 17. Theapparatus of claim 16, wherein the control circuitry is further tomonitor at least one signal to determine a time when the logic block isable to perform an operation, and at that time, to place the logic blockin a performance state.
 18. The apparatus of claim 17, wherein the atleast one signal is a signal to be provided by an execution unit coupledto the logic block and to the control circuitry.
 19. The apparatus ofclaim 17, wherein the control circuitry is further to place at least onesub-block within the logic block in a powered state prior to the timewhen the logic block is able to perform an operation.
 20. The apparatusof claim 19, wherein the at least one sub-block is a counter.
 21. Theapparatus of claim 16, wherein the logic block is a dispatch logic blockand the operation is a dispatch operation.
 22. The apparatus of claim16, wherein the logic block is a retire logic block and the operation isa retire operation.